专利摘要:
APPARATUS AND METHOD FOR GENERATING A FILE PLURALITY OF PARANETRIC AUDIO FLOWS AND APPARATUS AND METHOD FOR GENERATING A PLURALITY OF SPEAKER SIGNALS. An apparatus (100) for generating a plurality of parametric audio streams (125) (8i, Fiz, W;) of an input spatial audio signal (105) obtained from a recording in a recording space comprises a segmenter (110 ) and a generator (120). The segmenter (110) is configured to provide at least two input segmented audio signals (115) (Wi, Xi, Yi, Zl) of the input spatial audio signal (105), characterized by at least two input segmented audio signals (115) (W1, Xi, Y1, Zi) are associated with corresponding segments (Segi) of the recording space. The generator (120) is configured to generate a parametric audio stream for each of at least two input segmented audio signals (115) (Wi, Xi, Yi, Zi) to obtain the plurality of parametric audio streams (125) (6i,W-.,W1).
公开号:BR112015011107B1
申请号:R112015011107-6
申请日:2013-11-12
公开日:2021-05-18
发明作者:Kuech Fabian;Pulkki Ville;Kuntz Achim;Giovanni Del Galdo;Politis Archontis
申请人:Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V;Techniche Universitat Ilmenau;
IPC主号:
专利说明:

DESCRIPTION TECHNICAL FIELD
[0001] The present invention generally relates to a parametric spatial audio processing and, in particular, to an apparatus and a method for generating a plurality of parametric audio streams and an apparatus and a method for generating a plurality of signals from the speaker. Additional applications of the present invention relate to sector-based parametric spatial audio processing. HISTORY OF THE INVENTION
[0002] In multichannel listening, the listener is surrounded by multiple speakers. A variety of known methods exist for capturing audio for these configurations. Let's first consider speaker systems and the spatial impression that can be created with them. With no special techniques, common stereophonic two-channel setups can only create auditory events on the line connecting the speakers. Sound emanating from other directions cannot be produced. Logically, by using more speakers around the listener, more directions can be covered and a more natural spatial impression can be created. The most well-known multichannel speaker system and layout is the 5.1 standard ("ITU-R 775-1"), which consists of five speakers at azimuth angles of 0°, 30° and 110° with respect to the position of hearing. Other systems with a varying number of speakers located in different directions are also known.
[0003] In the technique, several different recording methods were designed for the previously mentioned speaker systems, in order to reproduce the spatial impression in the listener situation as it would be perceived in the recording environment. The ideal way to record spatial sound for a chosen multichannel speaker system would be to use the same number of microphones as there are speakers. In this case, the directivity patterns of the microphones must also match the speaker layout so that sound from any single direction would only be recorded with one, two, or three microphones. The most widely used loudspeakers, the narrowest directivity standards are then needed. However, these narrow directional microphones are relatively expensive, and typically have a non-flat frequency response, which is not desired. Furthermore, using the various microphones with very broad directivity patterns as input in multi-channel playback results in blurred and colorful auditory perception, due to the fact that sound emanating from a single direction is always reproduced with more speakers than required. Therefore, current microphones are well suited for two-channel recording and playback without the objective of an adjacent spatial impression.
[0004] Another known approach in recording spatial sound is to record a large number of microphones that are distributed over a wide spatial area. For example when recording an orchestra on a stage, the only instruments can be picked up by so-called local microphones, which are positioned close to the sound sources. The spatial distribution of the front sound stage can, for example, be captured by conventional stereo microphones. The sound field components corresponding to late reverberation can be captured by the various microphones placed relatively far from the stage. A sound engineer can then mix the desired multichannel output using a combination of all available microphone channels. However, this recording technique entails a very large recording setup and hand-produced mixing of the recorded channels, which is not always feasible in practice.
[0005] Conventional systems for recording and reproducing spatial audio based on directional audio coding (DirAC | directional audio coding), as described in T. Lokki, J. Merimaa, V. Pulkki: Method for Reproducing Natural or Modified Spatial Impression in Multichannel Listening, US Patent 7,787,638 B2, August 31, 2010 and V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding. J. Audio Eng. Soc., Vol. 55, No. 6, pp. 503-516, 2007, rely on a simple global model for the sound field. Thus, they have some systematic disadvantages, which limit the achievable sound quality and practical experience.
[0006] A general problem with known solutions is that they are relatively complex and typically associated with a degradation of spatial sound quality.
[0007] Thus, it is an object of the present invention to provide an improved concept for a parametric spatial audio processing that allows for higher quality, more realistic spatial sound recording and reproduction using relatively simple and compact microphone configurations. SUMMARY OF THE INVENTION
[0008] This object is obtained by an apparatus according to claim 1, an apparatus according to claim 13, a method according to claim 15, a method according to claim 16, a program a computer program according to claim 17 or a computer program according to claim 18.
[0009] According to an application of the present invention, an apparatus for generating a plurality of parametric audio streams of an input spatial audio signal obtained from a recording in a recording space comprises a segmenter and a generator. The segmenter is configured to provide at least two segmented input audio signals from the input spatial audio signal. Here, at least two input segmented audio signals are associated with corresponding segments of the recording space. The generator is configured to generate a parametric audio stream for each of at least two input segmented audio signals to obtain the plurality of parametric audio streams.
[0010] The basic idea underlying the present invention is that improved parametric spatial audio processing can be obtained if at least two input segmented audio signals are provided from the input spatial audio signal, characterized by at least two signals of input segmented audio signals are associated with corresponding segments of the recording space, and if a parametric audio stream is generated for each of the at least two input segmented audio signals to obtain the plurality of parametric audio streams. This allows you to achieve the highest quality, most realistic spatial sound recording and reproduction using relatively simple and compact microphone setups.
[0011] According to another application, the segmenter is configured to use a directivity pattern for each of the segments of the recording space. Here, the directivity pattern indicates directivity of at least two input segmented audio signals. By using directivity patterns, it is possible to obtain a better pattern match of the observed sound field, especially in complex sound scenes.
[0012] According to another application, the generator is configured to obtain the plurality of parametric audio streams, characterized in that the plurality of parametric audio streams comprise a component of at least two segmented input audio signals and a spatial information corresponding parametric. For example, the parametric spatial information of each of the parametric audio streams comprises a direction-of-arrival (DOA | direction-of-arrival) parameter and/or a broadcast parameter. By providing the DOA parameters and/or the diffusion parameters, it is possible to describe the sound field observed in a domain representing the parametric signal.
[0013] According to another application, an apparatus for generating a plurality of speaker signals from a plurality of parametric audio streams derived from an input spatial audio signal recorded in a recording space comprises a renderer and a combiner . The renderer is configured to provide a plurality of segmented speaker input signals from the plurality of parametric audio streams. Here, the segmented speaker input signals are associated with corresponding segments of the recording space. The combiner is configured to combine the segmented speaker input signals to obtain the plurality of speaker signals.
[0014] Other applications of the present invention provide methods for generating a plurality of parametric audio streams and for generating a plurality of speaker signals. BRIEF DESCRIPTION OF THE FIGURES
[0015] In the following, the applications of the present invention will be explained with reference to the attached drawings, in which:
[0016] Figure 1 shows a block diagram of an application of an apparatus for generating a plurality of parametric audio streams of an input spatial audio signal that records in a space of and a generator;
[0017] Figure 2 shows a schematic illustration of the device application segmenter, according to Figure 1, based on a mixing or matrix operation;
[0018] Figure 3 shows a schematic illustration of the device application segmenter, according to Figure 1, using a directivity pattern;
[0019] Figure 4 shows a schematic illustration of the device application generator, according to Figure 1, based on a parametric spatial analysis;
[0020] Figure 5 shows a block diagram of an application of an apparatus for generating a plurality of speaker signals from a plurality of parametric audio streams with a renderer and a combiner;
[0021] Figure 6 shows a schematic illustration of example segments of a recording space, each representing a subset of directions within a two-dimensional (2D) plane or within a three-dimensional (3D) space;
[0022] Figure 7 shows a schematic illustration of a computational calculation of the exemplary speaker signal for two segments or sectors of a recording space;
[0023] Figure 8 shows a schematic illustration of a computational calculation of the exemplary speaker signal for two segments or sectors of a recording space using second order format B input signals;
[0024] Figure 9 shows a schematic illustration of a computational calculation of the exemplary speaker signal for two segments or sectors of a recording space including a modification of the signal in a domain representing the parametric signal;
[0025] Figure 10 shows a schematic illustration of exemplary polar patterns of input segmental audio signals provided by the segmenter of the appliance application according to Figure 1;
[0026] Figure 11 shows a schematic illustration of an exemplary microphone configuration for performing sound field recording; and
[0027] Figure 12 shows a schematic illustration of an exemplary circular array of omnidirectional microphones for obtaining higher order microphone signals. DETAILED DESCRIPTION OF APPLICATIONS
[0028] Before discussing the present invention in more detail using the drawings, it is indicated that in the identical elements of the figures, the elements having the same function or the same effect are provided with the same reference numerals, so that the description of these elements and their functionality illustrated in the different applications are mutually interchangeable or can be applied to each other in the different applications.
[0029] Figure 1 shows a block diagram of an application of an apparatus 100 for generating a plurality of parametric audio streams 125 (θi, Φi, Wi) of an input spatial audio signal 105 obtained from a recording on a recording space with a segmenter 110 and a generator 120. For example, the input spatial audio signal 105 comprises an omnidirectional signal W and a plurality of different directional signals X, Y, Z, U, V (or X, Y, U, V). As shown in Figure 1, apparatus 100 comprises a segmenter 110 and a generator 120. For example, the segmenter 110 is configured to provide at least two segmental audio signals 115 (Wi, Xiz Yi, Zi) of the signal. omnidirectional W and the plurality of different directional signals X, Y, Z, U, V of the input spatial audio signal 105, wherein at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) are associated to the corresponding Segi segments of the recording space. Furthermore, the generator 120 can be configured to generate a parametric audio stream for each of at least two audio signals from the input segmenter 115 (Wif Xiz Yif Zi) to obtain the plurality of parametric audio streams 125 ( θi, 'Pi, Wi) .
[0030] By the apparatus 100 for generating the plurality of parametric audio streams 125, it is possible to avoid a degradation of the spatial sound quality and to avoid relatively complex microphone configurations. Certainly, the application of the apparatus 100 according to Figure 1 allows for a higher quality, more realistic spatial sound recording using relatively simple and compact microphone configurations.
[0031] In applications, the Segi segments of the recording space represent a subset of directions within a two-dimensional plane (2D) or within a three-dimensional space (3D).
[0032] In applications, the Segi segments of the recording space are characterized by an associated directional measurement.
[0033] According to the applications, the apparatus 100 is configured to perform a sound field recording to obtain the input spatial audio signal 105. For example, the segmenter 110 is configured to divide a full angle range of interest in the Segi segments of the recording space. In addition, Segi segments of recording space can cover a reduced angle range compared to the full angle range of interest.
[0034] Figure 2 shows a schematic illustration of the segmenter 110 of the application of the apparatus 100 according to Figure 1 based on a mixing (or matrix) operation. As exemplarily described in Fig. 2, the segmenter 110 is configured to generate at least two input segmental audio signals 115 (Wi, Xi. Yif Zi) of the omnidirectional signal W and the plurality of different directional signals X, Y, Z , U, V using a mix or matrix operation that depends on the Sec± segments of the recording space. By the segmenter 110 exemplary shown in Fig. 2, it is possible to map the omnidirectional signal W and the plurality of different directional signals X, Y, Z, U, V constituting the input spatial audio signal 105 into at least two segmental audio signals input 115 (Wi, Xi, Yi, Zi) using a predefined matrix or blend operation. This predefined mix or matrix operation depends on the Segi segments of the recording space and can substantially be used to branch at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) from the input spatial audio signal 105. The branching of at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) by the segmenter 110 which is based on mixing or matrix operation substantially allows to achieve the above mentioned advantages as opposed to a simple global model for the sound field.
[0035] Figure 3 shows a schematic illustration of the segmenter 110 of the application of the apparatus 100 according to Figure 1 using a directivity pattern 305 (desired or predetermined), qi(&). As exemplarily described in Figure 3, the segmenter 110 is configured to use a directivity pattern 305, q±(θ) for each of the Segi segments of the recording space. Furthermore, the directivity pattern 305, qi(3), may indicate a directivity of at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) .
[0036] In applications, the directivity pattern 305, qi (θ), is given by

[0037] where a and b denote multipliers that can be modified to obtain desired directivity patterns and where 3 denotes an azimuth angle and ©i indicates a preferred direction of the i'th segment of the recording space. For example, a range from 0 to 1 is found and b is in a range from - 1 to 1.
[0038] A useful choice of a, b multipliers might be a=0.5 and b=0.5, resulting in the following directivity pattern:

[0039] By the segmenter 110 exemplarily described in Figure 3, it is possible to obtain at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) associated with corresponding segments Segi of the recording space having a predetermined directivity pattern 305 , Qi(θ), respectively. It is indicated here that the use of the directivity pattern 305, qi(θ), for each of the Segi segments of the recording space allows to improve the spatial sound quality obtained with the apparatus 100.
[0040] Figure 4 shows a schematic illustration of the generator 120 of the application of the apparatus 100 according to figure 1 based on a parametric spatial analysis. As exemplarily described in Figure 4, generator 120 is configured to obtain the plurality of parametric audio streams 125 (θi, ’Pi, Wi). Furthermore, the plurality of parametric audio streams 125 (θi, Φi, Wi) can comprise a Wi component of at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) and a parametric spatial information corresponding θi, Φi.
[0041] In applications, generator 120 can be configured to perform a parametric spatial analysis for each of at least two input segmental audio signals 115 (Wif Xif Yiz Zi) to obtain the corresponding parametric spatial information θi, Φi .
[0042] In applications, the parametric spatial information θi, Φi of each of the parametric audio streams 125 (θi, Φi, Wi) comprises a direction-of-arrival (DOA) parameter θi and/or a diffusion parameter Φi .
[0043] In applications, the direction-of-arrival (DOA) parameter θi and the diffusion parameter Φi provided by the generator 120 exemplarily described in figure 4 can constitute the DirAC parameters for a parametric spatial audio signal processing. For example, generator 120 is configured to generate the DirAC parameters (e.g., DOA parameter θi and broadcast parameter Φi) using a time/frequency representation of at least two input segmental audio signals 115.
[0044] Figure 5 shows a block diagram of an application of an apparatus 500 for generating a plurality of speaker signals 525 (Li, L2, ...) from a plurality of parametric audio streams 125 (θi, Φi, Wi) with a renderer 510 and a combiner 520. In the application of figure 5, the plurality of parametric audio streams 125 (θi, Φiz Wi) can be derived from an input spatial audio signal (e.g., the signal input spatial audio 105 exemplary described in the application of Figure 1) recorded in a recording space. As shown in Figure 5, apparatus 500 comprises a renderer 510 and a combiner 520. For example, the renderer 510 is configured to provide a plurality of input segmental speaker signals 515 from the plurality of parametric audio streams 125 ( θi, 'Pi, Wi), where the input segmental speaker signals 515 are associated with corresponding segments (Segi) of the recording space. In addition, combiner 520 can be configured to combine the input segmental speaker signals 515 to obtain the plurality of speaker signals 525 (Li, L2, ...).
[0045] By providing the apparatus 500 of Fig. 5, it is possible to generate the plurality of speaker signals 525 (Li, L2, ...) from the plurality of parametric audio streams 125 (θi, Φif Wi), wherein the Parametric audio streams 125 (θi, 'Pi, Wi) can be transmitted from apparatus 100 of figure 1. Furthermore, apparatus 500 of figure 5 allows to achieve higher quality, more realistic spatial sound reproduction using parametric audio streams derived from relatively simple and compact microphone configurations.
[0046] In applications, the renderer 510 is configured to receive the plurality of parametric audio streams 125 (θi, Φi, Wi) . For example, the plurality of parametric audio streams 125 (θi, Φi, Wi) comprises a segmental audio component Wi and a corresponding parametric spatial information θi, Φi. In addition, the renderer 510 may be configured to render each of the segmental audio components WA using the corresponding parametric spatial information 505 (θi, Φi) to obtain the plurality of input segmental speaker signals 515.
[0047] Figure 6 shows a schematic illustration 600 of example Segi segments (i = 1, 2, 3, 4) 610, 620, 630, 640 of a recording space. In schematic illustration 600 of Figure 6, example segments 610, 620, 630, 640 of recording space represent a subset of directions within a two-dimensional (2D) plane. Furthermore, Segi segments of the recording space can represent a subset of directions within a three-dimensional (3D) space. For example, the Segi segments representing the subsets of directions within three-dimensional (3D) space may be similar to the segments 610, 620, 630, 640 exemplarily depicted in Figure 6. According to schematic illustration 600 of Figure 6, four segments examples 610, 620, 630, 640 for the apparatus 100 of Figure 1 are exemplary shown. However, it is still possible to use a different number of Segi segments (i = 1/2, ..., n, where i is an integer index, and n denotes the number of segments) . Example segments 610, 620, 630, 640 can represent in a polar coordinate system (see, for example, Figure 6). For three-dimensional (3D) space, Segi segments can similarly be represented in a spherical coordinate system.
[0048] In applications, the segmenter 110 exemplary shown in Fig. 1 can be configured to use Segi segments (e.g., example segments 610, 620, 630, 640 of Fig. 6) to provide at least two signals. input segment audio 115 (Wi, Xi, Yi, Zi) . Using segments (or sectors), it is possible to perform a parametric model based on the segment (or sector based) of the sound field. This allows you to achieve higher quality spatial audio recording and playback with a relatively compact microphone setup.
[0049] Figure 7 shows a schematic illustration 700 of a computational calculation of the exemplary speaker signal for two segments or sectors of a recording space. In schematic illustration 700 of Figure 7, application of apparatus 100 to generate the plurality of parametric audio streams 125 (θi, Φi, Wi) and application of apparatus 500 to generate the plurality of speaker signals 525 (Li, L2, ...) are exemplary described. As shown in schematic illustration 700 of Figure 7, segmenter 110 may be configured to receive input spatial audio signal 105 (e.g., microphone signal). In addition, segmenter 110 may be configured to provide at least two input segmental audio signals 115 (e.g., segmental signals from microphone 715-1 of a first segment and segmental signals from microphone 715-2 of a second segment. ). Generator 120 may comprise a first parametric spatial analysis block 720-1 and a second parametric spatial analysis block 720-2. In addition, generator 120 may be configured to generate the parametric audio stream for each of at least two input segmental audio signals 115. At the output of the application of apparatus 100, the plurality of parametric audio streams 125 will be obtained. For example, the first parametric spatial analysis block 720-1 will output a first parametric audio stream 725-1 of a first segment, while the second parametric spatial analysis block 720-2 will output a second parametric audio stream 725-2 of a second segment. In addition, the first parametric audio stream 725-1 provided by the first parametric spatial analysis block 720-1 may comprise parametric spatial information (e.g., θi, Φi) of a first segment and one or more signal(s) of segmental audio(s) (eg Wi) of the first segment, while the second parametric audio stream 725-2 provided by the second parametric spatial analysis block 720-2 may comprise parametric spatial information (eg ©2, ^ 2) from a second segment and one or more segmental audio signal(s) (eg W2) from the second segment. The application of apparatus 100 may be configured to transmit the plurality of parametric audio streams 125. As further shown in schematic illustration 700 of Figure 7, the application of apparatus 500 may be configured to receive the plurality of parametric audio streams 125 of the application. of apparatus 100. The renderer 510 may comprise a first rendering unit 730-1 and a second rendering unit 730-2. In addition, renderer 510 may be configured to provide the plurality of input segmental speaker signals 515 from the plurality of received parametric audio streams 125. For example, the first rendering unit 730-1 can be configured to provide input segmental speaker signals 735-1 from a first segment of the first parametric audio stream 725-1 of the first segment, while the second rendering unit 730-2 can be configured to provide 735-2 input segmental speaker signals of a second segment of the second 725-2 parametric audio stream of the second segment. In addition, combiner 520 can be configured to combine the input segmental speaker signals 515 to obtain the plurality of speaker signals 525 (e.g., Li, L2, ...).
[0050] The application of figure 7 essentially represents a concept of recording and reproduction of higher quality spatial audio using a parametric model based on the segment (or sector based) of the sound field, which allows recording still audio scenes space complexes with a relatively compact microphone setup.
[0051] Figure 8 shows a schematic illustration 800 of a computational calculation of the exemplary speaker signal for two segments or sectors of a recording space using 105 second-order format B input signals. exemplary speaker schematically illustrated in Figure 8 essentially corresponds to the computational computation of the signal of the exemplary speaker schematically illustrated in Figure 7. In the schematic illustration of Figure 8, the application of apparatus 100 for generating the plurality of parametric audio streams 125 and the application of apparatus 500 to generate the plurality of signals from speaker 525 are exemplary described. As shown in Fig. 8, the appliance application 100 can be configured to receive the input spatial audio signal 105 (e.g., B format microphone channels such as [W, X, Y, U, V]). Here, it should be noted that the U, V signals in Fig. 8 are second-order format B components. The segmenter 110 exemplary denoted "matrixing" may be configured to generate at least two input segmental audio signals 115 of the omnidirectional signal and the plurality of different directional signals using a mixing or matrix operation that depends on the Segi segments of space. of recording. For example, at least two input segmental audio signals 115 may comprise the segmental signal from microphone 715-1 of a first segment (e.g., [Wi, Xi, Yi]) and segmental signals from microphone 715-2 of a second segment (for example, [W2, X2, Y2] ) . In addition, generator 120 may comprise a first directional and fuzzy analysis block 720-1 and a second directional and fuzzy analysis block 7202. The exemplary shown first and second directional and fuzzy analysis blocks 720-1, 720-2 in figure 8 essentially correspond to the first and second parametric spatial analysis blocks 720-1, 720-2 exemplarily shown in figure 7. Generator 120 can be configured to generate a parametric audio stream for each of at least two input segmental audio signals 115 to obtain the plurality of parametric audio streams 125. For example, generator 120 can be configured to perform a spatial analysis on the segmental signals from microphone 7151 of the first segment using the first diffuse and directional analysis block 720-1 and to extract a first component (eg a segmental audio signal Wi) from the segmental signals from the microphone 715-1 of the first segment to obtain the first f 725-1 parametric audio luxury of the first segment. In addition, generator 120 can be configured to perform spatial analysis on segmental signals from microphone 715-2 of the second segment and to extract a second component (e.g., a segmental audio signal W2) from segmental signals from microphone 715-2. of the second segment using the second directional and fuzzy analysis block 720-2 to obtain the second parametric audio stream 725-2 of the second segment. For example, the first parametric audio stream 725-1 of the first segment may comprise parametric spatial information of the first segment comprising a first direction-of-arrival (DOA) parameter θi and a first broadcast parameter as well as an extracted first component. Wi, while the second parametric audio stream 725-2 of the second segment may comprise the second segment parametric spatial information comprising a second direction-of-arrival (DOA) parameter θ2 and a second broadcast parameter Φ2 as well as a second component extracted W2. The appliance application 100 can be configured to transmit the plurality of parametric audio streams 125.
[0052] As further shown in schematic illustration 800 of Figure 8, the appliance application 500 for generating the plurality of speaker signals 525 can be configured to receive the plurality of parametric audio streams 125 transmitted from the appliance appliance 100. In schematic illustration 800 of Figure 8, the renderer 510 comprises the first rendering unit 730-1 and the second rendering unit 730-2. For example, the first rendering unit 730-1 comprises a first multiplier 802 and a second multiplier 804. The first multiplier 802 of the first rendering unit 730-1 can be configured to apply a first weighting factor 803 (for example, 71 — T ) to segment audio signal Wi of the first parametric audio stream 725-1 of the first segment to obtain a substream of the direct sound 810 by the first rendering unit 730-1, while the second multiplier 804 of the first rendering unit 730- 1 can be set to apply a second weighting factor 805 (eg VT ) to the segment audio signal Wi of the first parametric audio stream 725-1 of the first segment to obtain a diffuse substream 812 by the first rendering unit 730-1 . In addition, the second rendering unit 730-2 may comprise a first multiplier 806 and a second multiplier 808. For example, the first multiplier 806 of the second rendering unit 730-2 may be configured to apply a first weighting factor 807 ( eg 71-H7) to segment audio signal W2 of the second parametric audio stream 725-2 of the second segment to obtain a direct sound stream 814 by a second rendering unit 730-2, while the second multiplier 808 of the second rendering unit 730-2 can be configured to apply a second weighting factor 809 (eg AAP ) to the segment audio signal W2 of the second 725-2 parametric audio stream of the second segment to obtain a diffused substream 816 by a second rendering unit 730-2. In applications, the first and second weighting factors 803, 805, 807, 809 of the first and second rendering units 730-1, 730-2 are derived from the corresponding diffusion parameters ’Pi. According to the applications, the first rendering unit 730-1 can comprise gain factor multipliers 811, decorrelation processing blocks 813 and combination units 832, while the second rendering unit 730-2 can comprise gain factor multipliers. gain 815, decorrelation processing blocks 817, and combination units 834. For example, the gain factor 811 multipliers of the first rendering unit 730-1 can be configured to apply gain factors obtained from an amplitude variation operation with base on vectors (VBAP | vector base amplitude panning) by blocks 822 to the subflow of the direct sound 810 emitted by the first multiplier 802 of the first rendering unit 730-1. In addition, the decorrelation processing blocks 813 of the first rendering unit 730-1 can be configured to apply a decorrelation/gain operation to the fuzzy substream 812 at the output of the second multiplier 804 of the first rendering unit 730-1. In addition, the combining units 832 of the first rendering unit 730-1 can be configured to combine the signals obtained from the gain factor multipliers 811 and the decorrelation processing blocks 813 to obtain the segmental signals from the speaker 735- 1 of the first segment. For example, the 815 gain factor multipliers of the second rendering unit 730-2 can be configured to apply gain factors obtained from a 824 block vector based amplitude variation (VBAP) operation to the direct sound 814 subflow issued by the first multiplier 806 of the second rendering unit 730-2. In addition, the decorrelation processing blocks 817 of the second rendering unit 730-2 can be configured to apply a decorrelation/gain operation to the fuzzy substream 816 at the output of the second multiplier 808 of the second rendering unit 730-2. In addition, the combining units 834 of the second rendering unit 730-2 can be configured to combine the signals obtained from the gain factor multipliers 815 and the decorrelation processing blocks 817 to obtain the segmental signals from the speaker 735- 2 of the second segment.
[0053] In applications, the vector-based amplitude variation operation (VBAP) by blocks 822, 824 of the first and second rendering unit 730-1, 730-2 depends on the direction-of-finish (DOA) parameters ) corresponding θi. As exemplarily described in Fig. 8, combiner 520 can be configured to combine the input segmental speaker signals 515 to obtain the plurality of speaker signals 525 (e.g., Li, L2,...). As exemplarily described in Fig. 8, combiner 520 may comprise a first summing unit 842 and a second summing unit 844. For example, the first summing unit 842 is configured to sum a first of the segmental signals from speaker 735- 1 of the first segment and a first of the segmental signals from speaker 735-2 of the second segment to obtain a first signal from speaker 843. In addition, the second summing unit 844 can be configured to sum one of the second of the segmental signals of speaker 735-1 of the first segment and a second of the segment signals of speaker 735-2 of the second segment to obtain a second signal of speaker 845. 0 first and second signals of speaker 843, 845 can constitute the plurality of speaker signals 525. With reference to the application of figure 8, it should be noted that for each segment, speaker signals potentially for all speakers in the reproduction can be generated.
[0054] Figure 9 shows a schematic illustration 900 of a computational calculation of the exemplary speaker signal for two segments or sectors of a recording space including a modification of the signal in a domain representing the parametric signal. The computational calculation of the exemplary speaker signal in schematic illustration 900 of figure 9 essentially corresponds to the computational computation of the exemplary speaker signal in schematic illustration 700 of figure 7. However, the computational computation of the exemplary speaker signal in schematic illustration 900 of Figure 9 includes an additional sign modification.
[0055] In the schematic illustration 900 of Figure 9, the apparatus 100 comprises the segmenter 110 and the generator 120 for obtaining the plurality of parametric audio streams 125 (θiz 'Pi, Wi). In addition, apparatus 500 comprises renderer 510 and combiner 520 for obtaining the plurality of signals from speaker 525.
[0056] For example, the apparatus 100 may further comprise a modifier 910 to modify the plurality of parametric audio streams 125 (θi, Φj., Wi) into a domain representing the parametric signal. In addition, modifier 910 can be configured to modify at least one of the parametric audio streams 125 (θi, Φj, Wi) using a corresponding modify control parameter 905. Thus, a first modified parametric audio stream 916 of a first segment and a second modified parametric audio stream 918 of a second segment can be obtained. The first and second modified parametric audio streams 916, 918 may constitute a plurality of modified parametric audio streams 915. In applications, apparatus 100 may be configured to transmit the plurality of modified parametric audio streams 915. apparatus 500 may be configured to receive the plurality of modified parametric audio streams 915 transmitted from apparatus 100.
[0057] By providing the computational calculation of the exemplary speaker signal according to Figure 9, it is possible to achieve a more flexible spatial audio recording and reproduction scheme. In particular, it is possible to obtain higher quality output signals by applying the modifications in the parametric domain. By segmenting the input signals before generating the plurality of parametric audio representations (streams), a higher spatial selectivity is obtained that better allows to handle different components of the differently captured sound field.
[0058] Figure 10 shows a schematic illustration 1000 of exemplary polar patterns of input segmental audio signals 115 (e.g. Wi, Xi, Yi) provided by segmenter 110 of the application of apparatus 100 to generate the plurality of audio streams parametric parameters 125 (θi, 'J'ir Wi) according to figure 1. In the schematic illustration 1000 of figure 10, the 115 exemplary input segmental audio signals are displayed in a respective polar coordinate system for the two-dimensional (2D plane) ) . Similarly, the 115 exemplary input segmental audio signals can be displayed in a respective spherical coordinate system for three-dimensional (3D) space. Schematic illustration 1000 of Figure 10 exemplary depicts a first directional response 1010 for a first segmental input audio signal (eg, an omnidirectional signal Wi), a second input response (eg, a first directional signal XJ, and a third response directional 1030 of a third input segmental audio signal (e.g., a second directional signal Yi). In addition, a fourth directional response 1022 with opposite sign compared to second directional response 1020 and a fifth directional response 1032 with opposite sign compared to third directional response 1030 are exemplary described in schematic illustration 1000 of Fig. 10. Thus, different directional responses 1010, 1020, 1030, 1022, 1032 (polar patterns) can be used for the input segmental audio signals 115 by the segmenter 110. It is indicated here that the input segmental audio signals 115 may be time and frequency dependent, i.e. Wi = Wi (m, k) , Xi = Xi (m, k) , and Yi = Yi (m, k) , where (m, k) are indices indicating a portion of time/frequency in a representation of the signal. spatial audio.
[0059] In this context, it should be noted that figure 10 exemplarily describes the polar diagrams for a single set of input signals, that is, the 115 signals for a single sector i (for example, [Wi, Xi, Yi]) . In addition, the positive and negative parts of the polar diagram together represent the polar diagram of a signal, respectively (for example, parts 1020 and 1022 together show the polar diagram of signal Xi, while parts 1030 and 1032 together show the polar diagram of the Yi sign.).
[0060] Figure 11 shows a schematic illustration 1100 of an exemplary microphone configuration 1110 for performing sound field recording. In schematic illustration 1100 of Figure 11, the configuration of microphone 1110 may comprise several linear arrays of directional microphones 1112, 1114, 1116. Schematic illustration 1100 of Figure 11 exemplarily depicts how a two-dimensional (2D) observation space can be divided into different segments or sectors 1101, 1102, 1103 (eg Segi, i = 1, 2. 3) of recording space. Here, segments 1101, 1102, 1103 of Figure 11 may correspond to the exemplary Segi segments described in Figure 6. Similarly, the configuration of exemplary microphone 1110 may still be used in the three-dimensional (3D) observation space, in which the observation space three-dimensional (3D) can be divided into segments or sectors for a given microphone configuration. In applications, the configuration of the exemplary microphone 1110 in schematic illustration 1100 of Figure 11 can be used to provide the input spatial audio signal 105 for the application of apparatus 100 according to Figure 1. For example, the multiple linear arrays of directional microphones 1112, 1114, 1116 of the microphone configuration 1110 can be configured to provide the different directional signals for the input spatial audio signal 105. By using the exemplary microphone configuration 1110 of Figure 11, it is possible to optimize the recording quality of spatial audio using the segment-based (or sector-based) parametric model of the sound field.
[0061] In the above applications, the apparatus 100 and the apparatus 500 can be configured to be operative in the time/frequency domain.
[0062] In summary, the applications of the present invention relate to the field of recording and reproducing high quality spatial audio. The use of a parametric model based on the segment or sector of the sound field still allows recording complex spatial audio scenes with relatively compact microphone configuration. In contrast to a simple global model of the sound field assumed by current prior art methods, parametric information can be determined for a number of segments into which the entire observation space is divided. Thus, rendering for an almost arbitrary speaker configuration can be performed based on parametric information along with recorded audio channels.
[0063] According to the applications, for a flat two-dimensional (2D) sound field recording, the entire azimuth angle range of interest can be divided into multiple sectors or segments covering a reduced range of azimuth angles. Analogously, in the 3D case the total solid angle range (azimuth and elevation) can be divided into sectors or segments that cover a smaller angle range. Different industries or segments may also partially overlap.
[0064] According to the applications, each sector or segment is characterized by an associated directional measurement, which can be used to specify or refer to the corresponding sector or segment. The directional measurement can, for example, be a vector pointing to (or from) the center of the sector or segment, or an azimuth angle in the 2D case, or a set of an azimuth and elevation angle in the 3D case. The segment or sector can be referred to either as a subset of directions within a 2D plane or within 3D space. For simplicity of presentation, the previous examples have been exemplarily described for the 2D case; however the extension to 3D configurations is straightforward.
[0065] With reference to figure 6, the directional measurement can be defined as a vector that, for segment Seg3, points from the origin, that is, from the center with the coordinate (0, 0), to the right, that is, at direction to the coordinate (1, 0) in the polar diagram, or the azimuth angle of 0o if, in figure 6, angles are counted (or referred to) to the x-axis (horizontal axis).
[0066] With reference to the application of figure 1, the apparatus 100 can be configured to receive a number of microphone signals as an input (input spatial audio signal 105). These microphone signals can, for example, result either from a real recording or can be artificially generated by a simulated recording in a virtual environment. From these microphone signals, corresponding segment microphone signals (input segmental audio signals 115) can be determined, which are associated with corresponding segments (Segi). Segmental microphone signals characterize the specific characteristics. Its directional enhancement pattern may show significantly higher sensitivity within the associated angular sector compared to sensitivity outside this sector. An example of the segmentation of a full 360° azimuth range and the enhancement patterns of the associated segmental microphone signals has been illustrated with reference to Figure 6. In the example of Figure 6, the directivity of the microphones associated with sectors exhibit cardioid patterns that are rotated according to the angular range covered by the corresponding sector. For example, the microphone directivity associated with sector 3 (Seg3) pointing towards 0o is still pointing towards 0o. Here it should be noted that in the polar diagram of figure 6, the direction of maximum sensitivity is the direction in which the radius of the curve depicted comprises the maximum. Thus, Seg3 has the highest sensitivity for sound components arriving from the right. In other words, segment Seg3 has its preferred direction at azimuth angle of 0° (assuming angles are counted from the x-axis).
[0067] According to the applications, for each sector, a DOA parameter (θi) can be determined together with a diffusion parameter based on the sector (Ti). In a simple realization, the diffusion parameter (Ti) can be the same for all sectors. In principle, any preferred DOA estimation algorithm can be applied (eg by generator 120). For example, the DOA parameter (θi) can be interpreted to reflect the opposite direction in which most of the sound energy is traveling within the considered sector. Of course, sector-based diffusion refers to the index of the diffuse sound energy and the total sound energy within the considered sector. It should be noted that parameter estimation (as performed with generator 120) can be performed time-varyingly and individually for each frequency range.
[0068] According to the applications, for each sector, a directional audio stream (parametric audio stream) can be composed including the microphone segmental signal (Wi) and the DOA based on sector and diffusion parameters (θi, Ti) which predominantly describe the spatial audio properties of the sound field within the angular range represented by this sector. For example, the signals from speaker 525 for playback may be determined using parametric directional information (θi, Ti) and one or more of the segmental signals from microphone 125 (e.g., Wi). Thus, a set of segmental speaker signals 515 can be determined for each segment which can then be combined as by combiner 520 (e.g., summed or mixed) to create the final speaker signals 525 for playback. Direct sound components within a sector can, for example, be rendered as point-type sources by applying an amplitude variation based on exemplary vectors (as described in V. Pulkki: Virtual sound source positioning using Vector Base Amplitude Panning. J Audio Eng. Soc., Vol. 45, pp. 456-466, 1997), since diffuse sound can be reproduced from several speakers at the same time.
[0069] The block diagram in Figure 7 illustrates the computational calculation of the speaker 525 signals as described above for the case of two sectors. In Figure 7, bold arrows represent audio signals, whereas thin arrows represent parametric signals or control signals. In Figure 7, the generation of the segmental signals from the microphone 115 by the segmenter 110, the application of parametric spatial signal analysis (blocks 720-1, 720-1) for each sector (for example, by the generator 120), the generation of the signals segmental signals from speaker 515 by renderer 510 and the combination of segmental signals from speaker 515 by combiner 520 are schematically illustrated.
[0070] In applications, the segmenter 110 can be configured to perform the generation of segmental signals from the microphone 115 of a set of input signals from the microphone 105. In addition, the generator 120 can be configured to perform the application of the signal analysis parametric spatial for each sector so that the 725-1, 725-2 parametric audio streams for each sector will be obtained. For example, each of the parametric audio streams 725-1, 7252 can consist of at least one segmental audio signal (eg Wi, W2, respectively) as well as associated parametric information (eg DOA parameters θif θ2 and parameters diffusion Ti, Φ2, respectively) . The renderer 510 can be configured to perform the generation of the segmental signals from the speaker 515 for each sector based on the parametric audio streams 725-1, 725-2 generated for the particular sectors. The combiner 520 can be configured to combine the segmental speaker signals 515 to obtain the final speaker signals 525.
[0071] The block diagram in Figure 8 illustrates the computational calculation of the speaker signals 525 for the exemplary case of two sectors shown as an example for an application of the second-order format B microphone signal. As shown in the application of figure 8, two (sets of) segmental microphone signals 715-1 (eg, [Wx, Xi, Yx] ) and 715-2 (eg, [W2, X2, Y2] ) can be generated from an input set of microphone 105 signals by a mixing or matrix operation (e.g., by block 110) as previously described. For each of the two segmental microphone signals, a directional audio analysis (eg by blocks 720-1, 720-2) can be performed, producing the 725-1 directional audio streams (eg θi, Φj. WJ and 725-2 (for example, θ2, Φ2, W2) for the first sector and second sector, respectively.
[0072] In Figure 8, the segmental signals from the speaker 515 can be generated separately for each sector as follows. The segmental audio component Wi can be divided into two complementary substreams 810, 812, 814, 816 by weighting with multipliers 803, 805, 807, 809 derived from the spread parameter Φi. One substream may carry predominantly direct sound components, whereas the other substream may predominantly carry diffused sound components. Direct sound substreams 810, 814 can be rendered using variation gains 811, 815 determined by the DOA parameter θi, whereas diffuse substreams 812, 816 can be rendered incoherently using decorrelation processing blocks 813, 817.
[0073] As a last exemplary step, the segmental signals from speaker 515 can be combined (eg by block 520) to obtain the final output signals 525 for speaker reproduction.
[0074] With reference to the application of figure 9, it should be mentioned that the estimated parameters (within the 125 parametric audio streams) can still be modified (eg by modifier 910) before the actual speaker signals 525 for the reproduction are determined. For example, the DOA parameter θi can be remapped to achieve sound scene manipulation. In other cases, audio signals (eg Wi) from certain sectors may be attenuated before calculating speaker 525 signals if sound arriving from a certain or all directions included in these sectors is not desired. Similarly, diffuse sound components can be attenuated if primarily or only direct sound is to be rendered. This processing including a modification 910 of the parametric audio streams 125 is exemplarily illustrated in Fig. 9 for the example of a segmentation into two segments.
[0075] An application of a sector-based parameter estimate in the exemplary 2D case performed with the previous applications will be described below. It is assumed that the microphone signals used for capturing can be converted into so-called second order format B signals. Second order format B signals can be described by the shape of the directivity patterns of the corresponding microphones:

[0076] where θ denotes the azimuth angle. The corresponding B-format signals (eg, input 105 of figure 8) are denoted by W (m, k) , X (m, k) , Y (m, k) , U (m, k) and V (m , k) , where mek represent an index of time and frequency, respectively. The segmental microphone signal associated with the i'th sector is assumed to have a directivity pattern qi(θ). We can then determine (eg by block 110) the additional microphone signals 115, Wi (m, k), Xi (m, k), Yi (m, k) having a directivity pattern that can be expressed by

[0077] Some examples for the directivity patterns of the microphone signals described in the case of an exemplary cardioid pattern qi(θ) = 0.5 + 0.5 cos(θ + 0J are shown in figure 10. The preferred sector direction i'th depends on an azimuth angle 0i. In Figure 10, the dashed lines indicate the 1022, 1032 directional responses (polar patterns) with opposite sign compared to the 1020, 1030 directional responses depicted with solid lines.
[0078] Note that for the exemplary case of 0i = 0, the signals Wi (m, k) , Xi (m, k) , Yi (m, k) can be determined from the second order B format signals by mixing the input components W,X,Y,U,V according to

[0079] This mixing operation is performed, for example, in figure 2 in building block 110. Note that a different choice of qi(3) leads to a different mixing rule to obtain the Wiz XÍ,YÍ components of the signals second order format B.
[0080] From the segmental signals from microphone 115, Wi(m,k) , Xi(m,k) , Yi(m,k) , we can then determine (for example, by block 120) the DOA parameter θi associated with the sector i'th by computational calculation of the sector-based active intensity vector

[0081] where Re{A] denotes the real part of the complex number A and * denotes complex conjugate. Furthermore, p0 is the density of air and c is the velocity of the sum. The desired DOA estimate θi (m, k) , for example, represented by the unit vector eL (m, k) , can be obtained by

[0082] We can further determine the energy-related amount of the sound field based on the sector

[0083] The desired diffusion parameter ^(m, k) of the i'th sector can then be determined by

[0084] where g denotes an adequate scaling factor, E{ } is the expectation operator and I| II denotes the vector norm. It can be shown that the diffusion parameter 'Pi (m, k) is zero if only one plane wave is present and has a positive value less than or equal to one in the case of pure diffuse sound fields. In general, an alternative mapping function can be defined for diffusion that exhibits similar behavior, that is, giving 0 for direct sound only, and approximating 1 for the fully diffused sound field.
[0085] With reference to the application of figure 11, an alternative embodiment for parameter estimation can be used for different microphone configurations. As exemplarily illustrated in Figure 11, multiple linear arrays 1112, 1114, 1116 of directional microphones can be used. Figure 11 still shows an example of how 2D observation space can be divided into sectors 1101, 1102, 1103 for the given microphone configuration. The segmental signals from microphone 115 can be determined by beamforming techniques such as filter formation and summation beam applied to each of the linear arrays of microphone 1112, 1114, 1116. The beamforming can still be omitted, i.e. , the directional patterns of the directional microphones can be used as the only means to obtain segmental signals from the microphone 115 that show the desired spatial selectivity for each sector (Segi). The DOA parameter θi within each sector can be estimated using common estimation techniques such as the "ESPRIT" algorithm (as described in R. Roy and T. Kailath: ESPRITestimation of signal parameters via rotational invariance techniques, IEEE Transactions on Acoustic Processing , Speech and Sign, vol. 37, no. 7, pp. 984995, July 1989). The diffusion parameter Ti for each sector can, for example, be determined by evaluating the temporal variation of DOA estimates (as described in J. Ahonen, V. Pulkki: Diffuseness estimation using temporal variation of intensity vectors, IEEE Workshop on Applications of Signal Processing in Audio and Acoustics, 2009. WAS-PAA '09., pp. 285-288, 18-21 October 2009). Alternatively, the known coherence relationships between different microphones and the direct to diffuse sound ratio (as described in O. Thiergart, G. Del Galdo, EAP Habets,: Signal-to-reverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 309-312, 25-30 March 2012) can be employed.
[0086] Figure 12 shows a schematic illustration 1200 of an exemplary circular array of omnidirectional microphones 1210 for obtaining higher order microphone signals (e.g., input spatial audio signal 105). In schematic illustration 1200 of Figure 12, the circular array of omnidirectional microphones 1210 comprises, for example, 5 equidistant microphones arranged along a circle (dotted line) in a polar diagram. In applications, the circular array of omnidirectional microphones 1210 can be used to obtain the highest order (HO) microphone signals, as will be noted below. In order to calculate the exemplary second order microphone signals U and V of the omnidirectional microphone signals (provided by the 1210 omnidirectional microphones), at least 5 independent microphone signals must be used. This can be achieved elegantly, for example, using a Uniform Circular Array (UCA I Uniform Circular Array) as exemplary shown in figure 12. The vector obtained from the microphone signals at a certain time and frequency can, for example, be transformed with a Discrete Fourier Transform (DFT | Discrete Fourier Transform). Microphone signals W, X, Y, U, and V (ie, input spatial audio signal 105) can then be obtained by a linear combination of the DFT coefficients. Note that the DFT coefficients represent the Fourier series coefficients calculated from the vector of the microphone signals.
[0087] We consider Ym to denote the generalized order microphone m-th signal, defined by directivity standards

[0088] where θ denotes an azimuth angle so that


[0089] So it can be proven that

[0090] where

[0091] where j is the wave's imaginary unit, r and cp are the ria, k is the number and Pm are the coefficients of the Fourier series of the pressure signal measured in the polar coordinates (r, ç) .
[0092] Note that care must be taken in the matrix design and implementation of the calculation of the format B (highest order) signals to avoid excessive noise amplification due to the numerical properties of the Bessel function.
[0093] The mathematical background and derivations related to the transformation of the signal described can be found, for example, in A. Kuntz, Wave field analysis using virtual circular microphone arrays, Dr. Hut, 2009, ISBN: 978-3-86853-006 -3.
[0094] Other applications of the present invention relate to a method for generating a plurality of parametric audio streams 125 (θi, Φiz Wi) of an input spatial audio signal 105 obtained from a recording in a recording space. For example, the input spatial audio signal 105 comprises an omnidirectional signal W and a plurality of different directional signals X, Y, Z, 0, V. The method comprises providing at least two input segmental audio signals 115 ( Wi, Xi, Yi( Zi) of the input spatial audio signal 105 (for example, the omnidirectional signal W and the plurality of different directional signals X, Y, Z, 0, V), wherein at least two audio signals input segments 115 (Wi, Xiz Yiz Zi) are associated with corresponding Segi segments of the recording space. Furthermore, the method comprises generating a parametric audio stream for each of at least two segmental audio signals of input 115 (Wiz Xi, Yi, Zi) to obtain the plurality of parametric audio streams 125 (θi, Φiz Wi) .
[0095] Other applications of the present invention relate to a method for generating a plurality of speaker signals 525 (Li, L2, ...) from a plurality of parametric audio streams 125 (θi, Φi, Wi) derived from an input spatial audio signal 105 recorded in a recording space. The method comprises providing a plurality of input segmental speaker signals 515 of the plurality of parametric audio streams 125 (θi, Φi, Wi), wherein the input segmental speaker signals 515 are associated with the corresponding Segi segments of the recording space. Furthermore, the method comprises combining the input segmental speaker signals 515 to obtain the plurality of speaker signals 525 (Li, L2, ...).
[0096] Although the present invention has been described in the context of the block diagram where the blocks represent real or logical hardware components, the present invention can still be implemented by a computer-implemented method. In the latter case, the blocks represent the corresponding method steps where these steps support the functionalities performed by the corresponding logical or physical hardware blocks.
[0097] The applications described are merely illustrative for the principles of the present invention. It is understood that modifications and variations to the provisions and details described will be apparent to those skilled in the art. Thus, it is intended to be limited only by the scope of the attached patent claims and not by the specific details presented in the form of description and explanation of applications in this document.
[0098] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a characteristic of a method step. Similarly, aspects described in the context of a method step further represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps can be performed (or using) a hardware device such as a microprocessor, a programmable computer, or an electronic circuit. In some applications, some or more important method steps may be performed by this device.
[0099] Parametric audio streams 125 (θi, 1J'i, Wi) can be stored in a digital storage medium or can be transmitted in a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
[0100] Depending on certain implementation requirements, the applications of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example, a floppy disk, a DVD, a Blu-Ray, a CD, a ROM memory, an EPROM, an EEPROM or a FLASH memory, having stored electronically readable control signal in it, that cooperate (or are able to cooperate) with a programmable computer system so that the respective method is carried out. Thus, the digital storage medium can be computer readable.
[0101] Some applications according to the invention comprise a data loader having electronically readable control signals, which can cooperate with a programmable computer system, so that one of the methods described in this document is carried out.
[0102] In general, the applications of the present invention can be implemented as a computer program product with a program code, the program code being operative to perform one of the methods when the computer program product is executed in a computer. The program code can, for example, be stored in a machine readable loader.
[0103] Other applications comprise the computer program to perform one of the methods described in this document, stored in a machine readable loader.
[0104] In other words, an application of the inventive method is thus a computer program having a program code to perform one of the methods described in this document, when the computer program is executed on a computer.
[0105] Another application of the inventive method is then a data loader (or a digital storage medium, or a computer readable medium) comprising, recorded thereon, the computer program for performing one of the methods described in this document. The data carrier, digital storage medium or recorded medium is typically tangible and/or non-transient.
[0106] Another application of the inventive method is then a data stream or a sequence of signals representing the computer program to perform one of the methods described in this document. The data stream or signal sequence can, for example, be configured to be transferred via a data communication connection, for example via the Internet.
[0107] Another application comprises a processing means, for example, a computer or a programmable logic device, configured or adapted to perform one of the methods described in this document.
[0108] Another application comprises a computer having installed the computer program to perform one of the methods described in this document.
[0109] Another application according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program to perform one of the methods described in this document to a receiver. The receiver can, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server to transfer the computer program to the receiver.
[0110] In some applications, a programmable logic device (eg, an array of programmable field gates) can be used to perform some or all of the functionality of the methods described in this document. In some applications, an array of programmable field gates can operate with a microprocessor to perform one of the methods described in this document. Generally, the methods are preferably performed by any hardware device.
[0111] The applications of the present invention provide a high quality spatial sound recording and reproduction and realistic reproduction using simple and compact microphone configurations.
[0112] The applications of the present invention are based on directional audio coding (DirAC) (as described in T. Lokki, J. Merimaa, V. Pulkki: Method for Reproducing Natural or Modified Spatial Impression in Multichannel Listening, U.S. Patent No. 7,787,638 B2, Aug. 31, 2010 and V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding, J. Audio Eng. Soc., Vol. 55, No. 6, pp. 503-516, 2007 ), which can be used with different microphone systems, and with arbitrary speaker configurations. The benefit of DirAC is to reproduce the spatial impression of an existing acoustic environment as precisely as possible using a multi-channel speaker system. Within the chosen environment, responses (sound or continuous impulse responses) can be measured with an omnidirectional microphone (W) and with a set of microphones that allow measurement of direction-of-arrival (DOA) of sound and diffusion of sound. One possible method is to apply three figure-eight microphones (X, Y, Z) aligned with the corresponding Cartesian coordinate axis. One way to do this is to use a "Sound Field" microphone, which directly produces all the desired responses. It is interesting to note that the omnidirectional microphone signal represents the sound pressure, since the dipole signals are proportional to the corresponding elements of the particle velocity vector.
[0113] From these signals, the DirAC parameters, that is, DOA of the sound and the diffusion of the observed sound field can be measured in an adequate time/frequency raster with a resolution corresponding to the human auditory system. The actual speaker signals can then be determined from the omnidirectional microphone signal based on the DirAC parameters (as described in V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding. J. Audio Eng. Soc., Vol. 55, No. 6, pp. 503-516, 2007). Direct sound components can be reproduced by only a small number of speakers (eg one or two) using variation techniques, as diffuse sound components can be reproduced from all speakers at the same time.
[0114] The applications of the present invention based on DirAC represent a simple approach to recording spatial sound with compact microphone configurations. In particular, the present invention obviates some systematic disadvantages that limit the achievable quality and practical experience in the prior art.
[0115] Unlike conventional DirAC, the applications of the present invention provide a higher quality parametric spatial audio processing. Conventional DirAC relies on a simple global model for the sound field, employing just one DOA and one diffusion parameter for the entire observation space. It is based on the assumption that the sound field can only be represented by a single direct sound component, such as a plane wave, and a global diffusion parameter for each time/frequency portion. In turn, in practice, however, this simplified assumption about the sound field is generally not maintained. This is especially true in real complex acoustics, for example, where multiple sound sources such as speakers or instruments are active at the same time. On the other hand, the applications of the present invention do not result in a model mismatch of the observed sound field, and the corresponding parameter estimates are more correct. It can even be prevented that a model mismatch results, specifically in cases where the direct sound components are rendered diffusely and no direction can be perceived by hearing the speaker emit. In applications, decorrelators can be used to generate uncorrelated diffuse sound reproduced from all speakers (as described in V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding. J. Audio Eng. Soc., Vol. 55, No. 6, pp. 503-516, 2007). Unlike the prior art, where decorrelators generally introduce an unwanted added ambient effect, it is possible with the present invention to more correctly reproduce sound sources that have a certain spatial extent (as opposed to using the simple sound field model of DirAC which is not able to precisely capture these sound sources).
[0116] The applications of the present invention provide a higher degree of freedom number in the assumed signal model, allowing better model compatibility in complex sound scenes.
[0117] Furthermore, in case of using directional microphones to generate sectors (or any other linear medium with invariant time, for example, physical), a high inherent directivity can be obtained. Thus, it is less necessary to apply time-varying gains to avoid vague directions, crosstalk and coloration. This leads to non-linear processing in the audio signal pass, resulting in the highest quality.
[0118] In general, most direct sound components can be rendered as direct sound sources (point sources/plane wave sources) . As a consequence, fewer decorrelation perturbations occur, more (correctly) locatable events are perceived, and a more accurate spatial reproduction is achievable.
[0119] The applications of the present invention provide a high performance of a manipulation in the parametric domain, for example, directional filtering (as described in M. Kallinger, H. Ochsenfeld, G. Del Galdo, F. Kuech, D. Mahne, R. Schultz-Amling, and O. Thiergart: A Spatial Filtering Approach for Directional Audio Coding, 126th AES Convention, Paper 7653, Munich, Germany, 2009), compared to the simple global model, since a large fraction of the total signal energy is assigned to direct sound events with a correct DOA associated with it, and a great deal of information is available. The provision of more information (parametric) allows, for example, to separate multiple separate direct sound components or other direct sound components from early reflections impairing the different directions. following features. In the 2D case, the total azimuth angle range can be divided into sectors that cover the reduced range of azimuth angles. In the 3D case, the total solid angle range can be divided into sectors that span the reduced solid angle ranges. Each sector can be associated with a preferred angle range. For each sector, segmental microphone signals can be determined from the received microphone signals, which predominantly consist of the sound arrival directions that are assigned/covered by the particular sector. These microphone signals can be further determined artificially by the simulated virtual recordings. For each sector, a parametric sound field analysis can be performed to determine directional parameters such as DOA and diffusion. For each sector, the parametric directional information (DOA and diffusion) predominantly describes the spatial properties of the angular range of the sound field that is associated with the particular sector. In the case of playback, for each sector, speaker signals can be determined based on the directional parameters and segmental signals from the microphone. All output is then obtained by combining the outputs from all sectors. In case of manipulation, before calculating the speaker signals for reproduction, the estimated parameters and/or segmented audio signals can still be modified to achieve a manipulation of the sound scene.
权利要求:
Claims (14)
[0001]
1. An apparatus (100) for generating a plurality of parametric audio streams (125) (θi, Wi, Wi) of an input spatial audio signal (105) obtained from a recording in a recording space, characterized by the apparatus (100) comprising: a segmenter (110) for generating at least two input segmented audio signals (115) (Wi, Xi, Yi, Zi) from the input spatial audio signal (105); wherein the segmenter (110) is configured to generate at least two input segmented audio signals (115) (Wi, Xi, Yi, Zi) depending on the corresponding segments (Segi) of the recording space, wherein the segments (Segi) of the recording space represent a subset of directions within a two-dimensional plane (2D) or within a three-dimensional space (3D) and in which the segments (Segi) are different from each other; and a generator (120) for generating a parametric audio stream for each of at least two input segmented audio signals (115) (Wi, Xi, Yi, Zi) to obtain the plurality of parametric audio streams ( 125) (θi, Wi, Wi), such that the plurality of parametric audio streams (125) (θi, Wi, Wi) comprise a component (Wi) of at least two segmented input audio signals (115 ) (Wi, Xi, Yi, Zi) and a corresponding parametric spatial information (θi, Wi), where the parametric spatial information (θi, Wi) of each of the parametric audio streams (125) (θi, Wi, Wi) ) comprises the direction-of-arrival (DOA) parameter (θi) and/or a diffusion parameter (W i).
[0002]
Apparatus (100) according to claim 1, characterized in that segments (Segi) of the recording space are characterized by an associated directional measurement.
[0003]
The apparatus (100) according to claim 1 or 2, characterized in that the apparatus (100) is configured to perform a recording of the sound field to obtain the input spatial audio signal (105); wherein the segmenter (110) is configured to divide a full angle range of interest into segments (Segi) of recording space; where the segments (Segi) of the recording space cover a reduced angle range compared to the full angle range of interest.
[0004]
The apparatus (100) according to any one of claims 1 to 3, characterized in that the input spatial audio signal (105) comprises an omnidirectional signal (W) and a plurality of different directional signals (X, Y, Z , U, V).
[0005]
The apparatus (100) according to any one of claims 1 to 4, characterized in that the segmenter (110) is configured to generate at least two segmented input audio signals (115) (Wi, Xi, Yi, Zi) of the omnidirectional signal (W) and the plurality of different directional signals (X, Y, Z, U, V) using a mixing operation that depends on the segments (Segi) of the recording space.
[0006]
The apparatus (100) according to any one of claims 1 to 5, characterized in that the segmenter (110) is configured to use a directivity pattern (305) (qi(θ)) for each of the segments (Segi) of recording space; where the directivity pattern (305) (qi(θ)) indicates a directivity of at least two input segmented audio signals (115) (Wi, Xi, Yi, Zi).
[0007]
7. The apparatus (100) according to claim 6, characterized in that the directivity pattern (305) (qi(θ)) is given by qi(θ) = a + b cos(θ + 0i), where a and b denote multipliers that are modified to obtain a desired directivity pattern (305) (qi(θ)); where θ denotes an azimuthal angle and 0i indicates a preferred direction of the i’th segment of the recording space.
[0008]
Apparatus (100) according to any one of claims 1 to 7, characterized in that the generator (120) is configured to perform a parametric spatial analysis for each of at least two input segmented audio signals (115 ) (Wi, Xi, Yi, Zi) to obtain the corresponding parametric spatial information (θi, Wi) .
[0009]
Apparatus (100) according to any one of claims 1 to 8, further comprising: a modifier (910) for modifying the plurality of parametric audio streams (125) (θi, Wi, Wi) in a domain representing the parametric signal; wherein the modifier (910) is configured to modify at least one of the parametric audio streams (125) (θi, Wi, Wi) using a corresponding modify control parameter (905).
[0010]
10. An apparatus (500) for generating a plurality of speaker signals (525) (Li, L2, ...) from a plurality of parametric audio streams (125) (θi, Wi, Wi); characterized in that each of the plurality of parametric audio streams (125) (θi, Wi, Wi) comprises a segmented audio component (Wi) and a corresponding parametric spatial information (θi, Wi); wherein the parametric spatial information (θi, Wi) of each of the parametric audio streams (i25) (θi, Wi, Wi) comprises a direction-of-arrival (DOA) parameter (θi) and/or a parameter of broadcast (Wi); wherein the apparatus (500) comprises: a renderer (5i0) for providing a plurality of segmented speaker input signals (5i5) of the plurality of parametric audio streams (i25) (θi, Wi, Wi), so that the segmented speaker input signals (5i5) depend on the corresponding segments (Segi) of a recording space, where the segments (Segi) of the recording space represent a subset of directions within a two-dimensional (2D) plane or within a three-dimensional (3D) space, and in which the segments (Segi) are different from each other; wherein the renderer (510) is configured to render each of the segmented audio components (Wi) using the corresponding parametric spatial information (505) (θ i, Wi) to obtain the plurality of segmented speaker input signals ( 515); and a combiner (520) for combining the segmented speaker input signals (515) to obtain the plurality of speaker signals (525) (Li, L2, ...).
[0011]
11. A method for generating a plurality of parametric audio streams (125) (θi, Wi, Wi) of an input spatial audio signal (i05) obtained from a recording in a recording space, characterized by the method comprising: generating of at least two input segmented audio signals (ii5) (Wi, Xi, Yi, Zi) of the input spatial audio signal (i05); wherein the generation of at least two input segmented audio signals (ii5) (Wi, Xi, Yi, Zi) is conducted depending on the corresponding segments (Segi) of the recording space, wherein the segments (Segi) of the recording space represents a subset of directions within a two-dimensional plane (2D) or within a three-dimensional space (3D), and in which the segments (Segi) are different from each other; generating a parametric audio stream for each of at least two input segmented audio signals (ii5) (Wi, Xi, Yi, Zi) to obtain the plurality of parametric audio streams (125) (θi, Wi) , Wi), such that the plurality of parametric audio streams (125) (θi, Wi, Wi) comprises a component (Wi) of at least two segmented input audio signals (115) (Wi, Xi, Yi, Zi) and a corresponding parametric spatial information (θi, Wi), where the parametric spatial information (θi, Wi) of each of the parametric audio streams (125) (θi, Wi, Wi) comprises the direction parameter -of arrival (DOA) (θi) and/or a diffusion parameter (Wi).
[0012]
12. A method for generating a plurality of speaker signals (525) (LI, L2, ...) from a plurality of parametric audio streams (125) (θi, Wi, Wi); characterized in that each of the plurality of parametric audio streams (125) (θi, Wi, Wi) comprises a segmented audio component (Wi) and a corresponding parametric spatial information (θi, Wi); wherein the parametric spatial information (θi, Wi) of each of the parametric audio streams (125) (θi, Wi, Wi) comprises a direction-of-arrival (DOA) parameter (θi) and/or a parameter of broadcast (Wi); wherein the method comprises: providing a plurality of segmented speaker input signals (515) from the plurality of parametric audio streams (125) (θi, Wi, Wi), so that the segmented speaker input signals (515) depend on the corresponding segments (Segi) of a recording space, where the segments (Segi) of the recording space represent a subset of directions within a two-dimensional (2D) plane or within a three-dimensional (3D) space, and where the segments (Segi) are different from each other; wherein the provision of the plurality of segmented speaker input signals (515) is conducted by rendering each of the segmented audio components (Wi) using the corresponding parametric spatial information (505) (θi, Wi) to obtain the plurality of segmented speaker input signals (515); and combining the segmented speaker input signals (515) to obtain the plurality of speaker signals (525) (Li, L2, ...).
[0013]
Non-transient storage media having recorded instructions for execution on a computer, having a program code for performing the method, according to claim ii, characterized in that it comprises instructions which when executed perform the method for processing on a computer.
[0014]
14. Non-transient storage media having recorded instructions for execution on a computer, having a program code for performing the method, according to claim i2, characterized in that it comprises instructions which when executed perform the method for processing on a computer.
类似技术:
公开号 | 公开日 | 专利标题
BR112015011107B1|2021-05-18|apparatus and method for generating a plurality of parametric audio streams and apparatus and method for generating a plurality of speaker signals
US11217258B2|2022-01-04|Method and device for decoding an audio soundfield representation
US9271081B2|2016-02-23|Method and device for enhanced sound field reproduction of spatially encoded audio input signals
US8611550B2|2013-12-17|Apparatus for determining a converted spatial audio signal
Pelzer et al.2012|Auralization of a virtual orchestra using directivities of measured symphonic instruments
Farina et al.2016|Measuring spatial mimo impulse responses in rooms employing spherical transducer arrays
WO2019168083A1|2019-09-06|Acoustic signal processing device, acoustic signal processing method, and acoustic signal processing program
AU2016204408B2|2017-11-23|Method and device for decoding an audio soundfield representation for audio playback
Pajunen2019|Effects of a rigid spherical scatterer in spatial audio reproduction fields
AU2014265108B2|2016-06-30|Method and device for decoding an audio soundfield representation for audio playback
Dickins et al.2021|Validation of a Practical Spatial Soundfield Reproduction System Using a Directional Microphone
Kaiser2011|A hybrid approach for three-dimensional sound spatialization
CN113994716A|2022-01-28|Signal processing device and method, and program
BR112015010995B1|2021-09-21|ADJUSTMENT BY SEGMENT OF THE SPATIAL AUDIO SIGNAL FOR DIFFERENT CONFIGURATION OF THE PLAYBACK SPEAKERS
同族专利:
公开号 | 公开日
MX341006B|2016-08-03|
MX2015006128A|2015-08-05|
KR20150104091A|2015-09-14|
JP2016502797A|2016-01-28|
CA2891087C|2018-01-23|
TW201426738A|2014-07-01|
BR112015011107A2|2017-10-24|
US10313815B2|2019-06-04|
ES2609054T3|2017-04-18|
CN104904240A|2015-09-09|
EP2904818B1|2016-09-28|
CN104904240B|2017-06-23|
TWI512720B|2015-12-11|
CA2891087A1|2014-05-22|
RU2633134C2|2017-10-11|
US20150249899A1|2015-09-03|
WO2014076058A1|2014-05-22|
KR101715541B1|2017-03-22|
EP2904818A1|2015-08-12|
JP5995300B2|2016-09-21|
EP2733965A1|2014-05-21|
RU2015122630A|2017-01-10|
AR093509A1|2015-06-10|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JPH04158000A|1990-10-22|1992-05-29|Matsushita Electric Ind Co Ltd|Sound field reproducing system|
JP3412209B2|1993-10-22|2003-06-03|日本ビクター株式会社|Sound signal processing device|
US6021206A|1996-10-02|2000-02-01|Lake Dsp Pty Ltd|Methods and apparatus for processing spatialised audio|
FI118247B|2003-02-26|2007-08-31|Fraunhofer Ges Forschung|Method for creating a natural or modified space impression in multi-channel listening|
GB2410164A|2004-01-16|2005-07-20|Anthony John Andrews|Sound feature positioner|
MXPA06011359A|2004-04-05|2007-01-16|Koninkl Philips Electronics Nv|Multi-channel encoder.|
JP5513887B2|2006-09-14|2014-06-04|コーニンクレッカフィリップスエヌヴェ|Sweet spot operation for multi-channel signals|
US20080232601A1|2007-03-21|2008-09-25|Ville Pulkki|Method and apparatus for enhancement of audio reconstruction|
WO2009126561A1|2008-04-07|2009-10-15|Dolby Laboratories Licensing Corporation|Surround sound generation from a microphone array|
EP2154910A1|2008-08-13|2010-02-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus for merging spatial audio streams|
EP2249334A1|2009-05-08|2010-11-10|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio format transcoder|
EP2346028A1|2009-12-17|2011-07-20|Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V.|An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal|
US9552840B2|2010-10-25|2017-01-24|Qualcomm Incorporated|Three-dimensional sound capturing and reproducing with multi-microphones|
CN202153724U|2011-06-23|2012-02-29|四川软测技术检测中心有限公司|Active combination loudspeaker|FR3018026B1|2014-02-21|2016-03-11|Sonic Emotion Labs|METHOD AND DEVICE FOR RETURNING A MULTICANAL AUDIO SIGNAL IN A LISTENING AREA|
CN105376691B|2014-08-29|2019-10-08|杜比实验室特许公司|The surround sound of perceived direction plays|
CN105992120B|2015-02-09|2019-12-31|杜比实验室特许公司|Upmixing of audio signals|
CN107290711A|2016-03-30|2017-10-24|芋头科技(杭州)有限公司|A kind of voice is sought to system and method|
EP3297298B1|2016-09-19|2020-05-06|A-Volute|Method for reproducing spatially distributed sounds|
US10187740B2|2016-09-23|2019-01-22|Apple Inc.|Producing headphone driver signals in a digital audio signal processing binaural rendering environment|
GB2559765A|2017-02-17|2018-08-22|Nokia Technologies Oy|Two stage audio focus for spatial audio processing|
US9820073B1|2017-05-10|2017-11-14|Tls Corp.|Extracting a common signal from multiple audio signals|
US20210050028A1|2018-01-26|2021-02-18|Lg Electronics Inc.|Method for transmitting and receiving audio data and apparatus therefor|
CN111819862B|2018-03-14|2021-10-22|华为技术有限公司|Audio encoding apparatus and method|
GB2572420A|2018-03-29|2019-10-02|Nokia Technologies Oy|Spatial sound rendering|
US20190324117A1|2018-04-24|2019-10-24|Mediatek Inc.|Content aware audio source localization|
EP3618464A1|2018-08-30|2020-03-04|Nokia Technologies Oy|Reproduction of parametric spatial audio using a soundbar|
GB201818959D0|2018-11-21|2019-01-09|Nokia Technologies Oy|Ambience audio representation and associated rendering|
法律状态:
2017-10-31| B15I| Others concerning applications: loss of priority|
2018-11-21| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2020-01-14| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-03-09| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-05-18| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 12/11/2013, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US201261726887P| true| 2012-11-15|2012-11-15|
US61/726,887|2012-11-15|
EP13159421.0A|EP2733965A1|2012-11-15|2013-03-15|Apparatus and method for generating a plurality of parametric audio streams and apparatus and method for generating a plurality of loudspeaker signals|
EP13159421.0|2013-03-15|
PCT/EP2013/073574|WO2014076058A1|2012-11-15|2013-11-12|Apparatus and method for generating a plurality of parametric audio streams and apparatus and method for generating a plurality of loudspeaker signals|
[返回顶部]